Text Categorization based on Associative Classification

نویسندگان

  • Padmavati Shrivastava
  • Uzma Ansari
چکیده

Text mining is an emerging technology that can be used to augment existing data in corporate databases by making unstructured text data available for analysis. The incredible increase in online documents, which has been mostly due to the expanding internet, has renewed the interest in automated document classification and data mining. The demand for text classification to aid the analysis and management of text is increasing. Text is cheap, but information, in the form of knowing what classes a text belongs to, is expensive. Text classification is the process of classifying documents into predefined categories based on their content. Automatic classification of text can provide this information at low cost, but the classifiers themselves must be built with expensive human effort, or trained from texts which have themselves been manually classified. Both classification and association rule mining are indispensable to practical applications. For association rule mining, the target of discovery is not pre-determined, while for classification rule mining there is one and only one predetermined target. Thus, great savings and conveniences to the user could result if the two mining techniques can somehow be integrated. In this paper, such an integrated framework, called associative classification is used for text categorization The algorithm presented here for text classification uses words as features , to derive feature set from preclassified text documents. The concept of Naïve Bayes classifier is then used on derived features for final classification.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA

With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However...

متن کامل

Associative Classification in Text Categorization

Text categorization has become one of the key techniques for handling and organizing text data. This model is used to classify new article to its most relevant category. In this paper, we propose a novel associative classification algorithm ACTC for text categorization. ACTC aims at extracting the k-best strong correlated positive and negative association rules directly from training set for cl...

متن کامل

Arabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents

Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...

متن کامل

ACNB: Associative Classification Mining Based on Naïve Bayesian Method

Integrating association rule discovery and classification in data mining brings a new approach known as associative classification. Associative classification is a promising approach that often constructs more accurate classification models (classifiers) than the traditional classification approaches such as decision trees and rule induction. In this research, the authors investigate the use of...

متن کامل

Benefits of Associative Classification within Text Categorisation

Associative Classification has been successfully employed in many diverse classification problem domains, showing high classification accuracy and adequate computation time relative to the other traditionally used solutions. Despite this, very little research has been conducted with it in the problem area of Text Categorisation and only a small number of approaches presently exist that are base...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010